Exploratory Data Analysis by Allan Visochek

Using the Prosper Loan Company Dataset

06-01-2015

The following exploratory data analysis is an

investigation of the factors influencing the

borrower rate of loans from the Prosper Loan Company

from the second quarter of 2006 through the first

quarter of 2014.

Univariate Plots Section

——————————-

BorrowerRate Histogram & Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

BorrowerRate Ranges from 0 to .05

BorrowerRate Quantiles:

##     10% 
## 0.09886
##    90% 
## 0.3099

Most BorrowerRates are between .099 and .310

LoanOriginalAmount Histogram and Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

all loans are between 0 and 35,000…

75% of loans are under 12,000…

Loan Term, Bar Chart

All loans have a term of either 1,3 or 5 years.

Employment Status Summary

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

The large majority of loans go to individuals who have a status of either “Employed” or “Full Time”. No surprises here.

Homeowner Summary

## False  True 
## 56459 57478

About half of the Borrowers are homeowners

Borrower State Summary

##          AK    AL    AR    AZ    CA    CO    CT    DC    DE    FL    GA 
##  5515   200  1679   855  1901 14717  2210  1627   382   300  6720  5008 
##    HI    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI 
##   409   186   599  5921  2078  1062   983   954  2242  2821   101  3593 
##    MN    MO    MS    MT    NC    ND    NE    NH    NJ    NM    NV    NY 
##  2318  2615   787   330  3084    52   674   551  3097   472  1090  6729 
##    OH    OK    OR    PA    RI    SC    SD    TN    TX    UT    VA    VT 
##  4197   971  1817  2972   435  1122   189  1737  6842   877  3278   207 
##    WA    WI    WV    WY 
##  3048  1842   391   150

There is a pretty even distribution of borrowers among the States, when taking state population into account.

Occupation Summary

##                                                        Accountant/CPA 
##                               3588                               3233 
##           Administrative Assistant                            Analyst 
##                               3688                               3602 
##                          Architect                           Attorney 
##                                213                               1046 
##                          Biologist                         Bus Driver 
##                                125                                316 
##                         Car Dealer                            Chemist 
##                                180                                145 
##                      Civil Service                             Clergy 
##                               1457                                196 
##                           Clerical                Computer Programmer 
##                               3164                               4478 
##                       Construction                            Dentist 
##                               1790                                 68 
##                             Doctor                Engineer - Chemical 
##                                494                                225 
##              Engineer - Electrical              Engineer - Mechanical 
##                               1125                               1406 
##                          Executive                            Fireman 
##                               4311                                422 
##                   Flight Attendant                       Food Service 
##                                123                               1123 
##            Food Service Management                          Homemaker 
##                               1239                                120 
##                           Investor                              Judge 
##                                214                                 22 
##                            Laborer                        Landscaping 
##                               1595                                236 
##                 Medical Technician                  Military Enlisted 
##                               1117                               1272 
##                   Military Officer                        Nurse (LPN) 
##                                346                                492 
##                         Nurse (RN)                       Nurse's Aide 
##                               2489                                491 
##                              Other                         Pharmacist 
##                              28617                                257 
##         Pilot - Private/Commercial  Police Officer/Correction Officer 
##                                199                               1578 
##                     Postal Service                          Principal 
##                                627                                312 
##                       Professional                          Professor 
##                              13628                                557 
##                       Psychologist                            Realtor 
##                                145                                543 
##                          Religious                  Retail Management 
##                                124                               2602 
##                 Sales - Commission                     Sales - Retail 
##                               3446                               2797 
##                          Scientist                      Skilled Labor 
##                                372                               2746 
##                      Social Worker         Student - College Freshman 
##                                741                                 41 
## Student - College Graduate Student           Student - College Junior 
##                                245                                112 
##           Student - College Senior        Student - College Sophomore 
##                                188                                 69 
##        Student - Community College         Student - Technical School 
##                                 28                                 16 
##                            Teacher                     Teacher's Aide 
##                               3759                                276 
##              Tradesman - Carpenter            Tradesman - Electrician 
##                                120                                477 
##               Tradesman - Mechanic                Tradesman - Plumber 
##                                951                                102 
##                       Truck Driver                    Waiter/Waitress 
##                               1675                                436

Prosper Rating Summary

##           A    AA     B     C     D     E    HR 
## 29084 14551  5372 15581 18345 14274  9795  6935

Credit Grade Summary

##           A    AA     B     C     D     E    HR    NC 
## 84984  3315  3509  4389  5649  5153  3289  3508   141

There are a large number of Borrowers that don’t have a prosper rating or don’t have a credit score…

Credit Score Histogram and Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591

Most borrowers have a credit score between 600 and 800

IncomeRange Summary

##             $0      $100,000+      $1-24,999 $25,000-49,999 $50,000-74,999 
##            621          17337           7274          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

Few loans are given out to individuals with low income, or who are unemployed, no surprises here.

StatedMonthlyIncome Histogram and Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750000

DebtToIncomeRatio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Most borrowers have a DebtToIncomeRatio in the range of 0 to 0.5. Those with a ratio of 10 are probably errors.

Loan Period

We can see here that the number of loans given out dropped dramatically in the last quarter of 2008, likely due to the recession.

Univariate Analysis

———————

What is the structure of your dataset?

There are 113,937 loans in the dataset with 81 features, 13 of which were used in the analysis:

Numerical Variables:

BorrowerRate

CreditScoreRangeLower

DebtToIncomeRatio

LoanOriginalAmount

StatedMonthlyIncome

Ordered factor variables:

(from best to worst / greatest to least…)
Term:

60,36,12

(note that term was a numerical variable but was transformed to a factor variable because it has very few values)

CreditGrade:

AA,A,B,C,D,E,HR,NC,none

ProsperRating..Alpha.:

AA,A,B,C,D,E,HR,none

IncomeRange:

$75,000-99,999 ; $50,000-74,999 ; $1-49,999 1-24,999 ; $0 ; Not employed; Not displayed

Unordered factor variables:

EmploymentStatus:

Employed, Full-time, Not employed, Part-time, Retired, Self-employed none,Not available, Other,

Occupation:

Accountant/CPA, Administrative Assistant, Analyst, Architect, Attorney, Biologist, Bus Driver, Car Dealer, Chemist, Civil Service, Clergy, Computer Programmer, Construction, Dentist, Doctor, Engineer - Chemical, Engineer - Electrical, Engineer - Mechanical, Executive, Fireman, Flight Attendant, Food Service, Food Service Management, Homemaker, Investor, Judge, Laborer, Landscaping, Medical Technician, Military Enlisted, Military Officer, Nurse (LPN), Nurse (RN), Nurse’s Aide, Other, Pharmacist, Pilot - Private/Commercial, Police Officer/Correction Officer, Postal Service, Principal, Professional, Professor, Psychologist, Realtor, Religious, Retail Management, Sales - Commission, Sales - Retail, Scientist, Skilled Labor, Social Worker, Student - College Freshman, Student - College Graduate Student, Student - College Junior, Student - College Senior, Student - College Sophomore, Student - Community College, Student - Technical School, Teacher, Teacher’s Aide, Tradesman - Carpenter, Tradesman - Electrician, Tradesman - Mechanic, Tradesman - Plumber, Truck Driver, Waiter/Waitress

BorrowerState:

AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA, HI, IA, ID, IL, IN, KS, KY, LA, MA, MD, ME, MI, MN, MO, MS, MT, NC, ND, NE, NH, NJ, NM, NV, NY, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WA, WI, WV, WY

Other observations.

BorrowerRate Ranges from 0 to .05

Most BorrowRates are between .099 and .310

Loans amounts range from 0 to $35,000.

75% of loans are for under $12,000.

All loans have a term of either 1,3 or 5 years.

The large majority of loans go to individuals who are employed full time.

A large number of Borrowers either don’t have a prosper rating or don’t have a credit score.

Few loans are given out to individuals with low income, or who are unemployed.

The number of loans given out dropped dramatically in the last quarter of 2008

What is/are the main feature(s) of interest in your dataset?

The main feature of interest for this investigation is the BorrowerRate. Again, this investigation is primarily concerned with the factors influencing the borrower rate.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

It is hard to say at this point which features will be most useful based solely on the univariate plots above. All of the variables mentioned above were selected for the investigation because they are likely to have an impact on the borrower rate.

Did you create any new variables from existing variables in the dataset?

HasCreditGrade -> CreditGrade is available

HasProsperRating -> ProsperRating is available

HasIncome -> IncomeRange is available and greater than 0

Additional variables are created in later sections

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I log transformed StatedMonthlyIncome and DebtToIncomeRatio, which had left leaning distributions:

I reordered the following factor variables:

ProsperScore..Alpha,

CreditGrade,

Incomerange,

LoanOrginationQuarter

I also subsetted the data for StatedMonthlyIncome by IncomeVerifiable.

Bivariate Plots Section

——————————-

Borrower Rate Over Time

The median Borrower Rate was decreasing significantly from 2012 to 2014. The wide variation in mean BorrowerRate over time means it will be important to facet by quarter in the multivariate plots section.

BorrowerRate v. Loan Term

## Term: 12
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.0929  0.1434  0.1501  0.2064  0.2669 
## -------------------------------------------------------- 
## Term: 36
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1274  0.1815  0.1935  0.2599  0.4975 
## -------------------------------------------------------- 
## Term: 60
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0669  0.1490  0.1870  0.1930  0.2319  0.3304

1 year loans have a slightly lower rate than the others. I will take out the 1 and 5 year loans, since most loans have a 3 year term, (as observed in the univariate plots section).

BorrowerRate v. LoanOriginalAmount

In order to get a better picture, I’m going to

cut the LoanOriginalAmount variable into Quantiles.

Median BorrowerRate v. LoanOriginalAmount Quantile

Smaller loan amounts have a significantly higher median borrower rate.

BorrowerRate v. DebtToIncomeRatio

Again, I’m going to cut the DebtToIncomeRatio variable

into quantiles for a cleaner plot…

Median BorrowerRate v. DebtToIncomeRatio Quantile

We can see from this graph that borrowers with a higher debt level relative to the others tend to have a higher borrower rate.

BorrowerRate v. CreditScore

BorrowerRate v. CreditGrade

Both CreditGrade and CreditScore have exceptionally high correlation to the Borrower Rate

the BorrowerRate. Credit Grade is likely derived from the Credit Score.

This is verified in the plot below.

Credit Score v. Credit Grade

Borrower Rate v. Prosper Rating

Prosper ratings are even more closely related to the BorrowerRate than credit grades. The Prosper Rating is likely a metric that the Prosper Loan Company uses to asses risk, based off of other parameters.

BorrowerRate v. StatedMonthlyIncome

Once again, I’m going to cut the StatedMonthlyIncome variable

into quantiles for a cleaner plot…

Median BorrowerRate v. StatedMonthlyIncome Quantile

The median BorrowerRate varies consistently with the Quantile of StatedMonthlyIncome.

BorrowerRate v. IncomeRange

Not surprisingly, there is a similar variation in BorrowerRate amongst Borrowers with different Income Ranges.

BorrowerRate v. Employment Status

Unemployed borrowers have a significantly higher median BorrowerRate than the others

BorrowerRate v. IsBorrowerHomeowner

Homeowners have a lower median BorrowerRate than non-homeowners.

BorrowerRate v. BorrowerState

The worst States to get a loan are Maine and Indiana. The Best States are Alabama and North Dakota

BorrowerRate v. Occupation

In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer Engineer) have a lower median borrower rate than lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman).

Bivariate Analysis

——————-

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

The average Borrower Rate was decreasing significantly from 2012 to 2014.

The Quantile of DebtToIncomeRatio and Median BorrowerRate are correlated.

CreditScore has a significant correlation to BorrowerRate

CreditGrade is derived from Credit Score

Prosper Ratings are highly related to BorrowerRate.

The Median BorrowerRate goes down as IncomeRange goes up.

StatedMonthlyIncomeQuantile and the Median BorrowerRate are highly correlated.

Unemployed borrowers have a significantly higher Median BorrowerRate than the others.

Homeowners have a lower median BorrowerRate than non-homeowners.

The worst States to get a loan are Maine and Indiana. The Best States are Alabama and North Dakota.

In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer Engineer) have a lower median borrower rate than lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman).

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

CreditGrade is derived from CreditScore.

ProsperRating determines the BorrowerRate almost perfectly, and is likely a score assigned by the prosper loan company based off of other parameters.

What was the strongest relationship you found?

The strongest Relationship was between BorrowerRate and ProsperRating, however ProsperRating its self is likely composed of other parameters, as noted above.

Aside from this, the relationship between BorrowerRate and CreditScore was the second strongest.

Multivariate Plots Section

—————————

In this section, my main objective is to see how the observed relationships from the previous section hold up and/or change over time.

To start off with, I will investigate the relationship between CreditGrade, Prosper Rating and Credit Score in more detail.

Median Borrower Rate v. ProsperRating Over Time

Median Borrower Rate v. CreditGrade Over Time

We can see from these two charts that the relationship between the prosper rating and borrower rate and the relationship between the credit grade and borrower rate are farely consistent over time.

An unintended, but useful insight from the above visualizations is that the credit grade data goes up to Q2 2009, and the prosper rating data ranges from Q3 2009 onwards. Given that both parameters are closely related to the BorrowerRate (as ovserved in the bivariate plots section), this suggests that the Prosper Loan Company used the borrower’s CreditGrade or credit score up until Q2 2009 as a primary metric in order to determine the BorrowerRate and used the Prosper Rating thereafter.

Next, let’s investigate how the correlation between BorrowerRate and CreditScoreRangeLower changes over time.

Correlation Between BorrowerRate and CreditScore by Quarter

The correlation between credit score and borrower rate varies significantly over time. This means it may be useful to model each quarter separately, or make models for different segments of time.

In particular, Correlation is higher Goes down after Q3 2011.

LoanOriginalAmount by Quarter

The bottoms of the plots for 2006 through 2010 have an upwards slant. It appears LoanOriginalAmount may play a role in the BorrowerRate at least up until Q2 2010.

Correlation of BorrowerRate and LoanOriginalAmount by Quarter


Correlation between BorrowerRate and LoanOriginalAmount goes up in 2007 and back down around Q2 2013.

Median Borrower Rate v. Income Level by quarter.

The relationship between IncomeRange and BorrowerRate appears to be much stronger after 2009. In order to get a clearer picture of this, The StatedMonthlyIncomeQuantile is plotted against the median borrower rate by quarter below.

Median BorrowerRate v. StatedMonthlyIncome Quantile

It looks like Income is more closely correlated to the BorrowerRate after Q2 2009. This is verified in the chart below.

Correlation Between BorrowerRate and log10(StatedMonthlyIncome+1) by Quarter.

Correlation between StatedMonthlyIncome and BorrowerRate falls in Q2 2012

Next, we know from the previous DebtToIncomRatioQuantile v. Median BorrowerRate line plot that BorrowerRate and DebToIncomeRatio are related. Lets try and find where those relationships are particularly strong.

BorrowerRate v. log transform of DebtToIncomeRatio by Quarter with constant credit score of 760

Correlation of BorrowerRate and log10(DebtToIncomeRatio+1) by Quarter

Correlation Between BorrowerRate and DebtToIncomeRatio is generally low, but goes up in Q2 2009, and back down in Q2 2011.

There appears to be a consistent pattern here. There seems to be a change around Q2 2009, when the prosper rating went into affect, and around Q2 2011 in the methodology used to calculate the loans.

Linear Models

————

Linear Model for all Loans

## 
## Call:
## lm(formula = BorrowerRate ~ LoanOriginalAmount + CreditScoreRangeLower + 
##     log10(DebtToIncomeRatio + 1))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42386 -0.04944 -0.00974  0.04628  0.23322 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   5.500e-01  2.465e-03  223.09   <2e-16 ***
## LoanOriginalAmount           -2.710e-06  4.201e-08  -64.51   <2e-16 ***
## CreditScoreRangeLower        -5.258e-04  3.708e-06 -141.79   <2e-16 ***
## log10(DebtToIncomeRatio + 1)  1.969e-01  3.529e-03   55.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06473 on 79907 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.3203, Adjusted R-squared:  0.3202 
## F-statistic: 1.255e+04 on 3 and 79907 DF,  p-value: < 2.2e-16

The R^2 value for this fit is quite low.

Given the variation in the median borrower rate over time, as well as well as the variation over time amongst several variables in the correlation to the borrower rate, there may be better results if models are created for different periods of time.

We know the following from the previous plots:

Correlation between CreditScoreRangeLower and BorrowerRate goes down around Q1-Q2 2011.

Correlation between LoanOriginalAmount and BorrowerRate goes down around Q1-Q2 2011.

Correlation between StatedMonthlyIncome and BorrowerRate goes down around Q1 2011

Correlation between DebtToIncomeRatio and BorrowerRate goes up in Q2 2009 and back down in Q1 2011.

As such, it seems like good intervals to split the data would be from the beginning to Q2 2009, and from Q3 2009 through Q1 2011.

Model For loans in or before Q2 2009

## 
## Call:
## lm(formula = BorrowerRate ~ LoanOriginalAmount + CreditScoreRangeLower)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.287011 -0.031315 -0.007581  0.024863  0.215443 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            6.115e-01  2.605e-03  234.75   <2e-16 ***
## LoanOriginalAmount     1.911e-06  6.238e-08   30.64   <2e-16 ***
## CreditScoreRangeLower -6.819e-04  4.213e-06 -161.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05183 on 26918 degrees of freedom
## Multiple R-squared:  0.5076, Adjusted R-squared:  0.5076 
## F-statistic: 1.387e+04 on 2 and 26918 DF,  p-value: < 2.2e-16

Model for Loans Originating from Q3 2009 through Q1 2011

## 
## Call:
## lm(formula = BorrowerRate ~ LoanOriginalAmount + log10(DebtToIncomeRatio + 
##     1) + CreditScoreRangeLower)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.196538 -0.051013 -0.004821  0.050731  0.237776 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   1.020e+00  1.021e-02  99.948  < 2e-16 ***
## LoanOriginalAmount            1.661e-06  2.190e-07   7.585 3.69e-14 ***
## log10(DebtToIncomeRatio + 1)  1.471e-01  1.336e-02  11.010  < 2e-16 ***
## CreditScoreRangeLower        -1.182e-03  1.484e-05 -79.670  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06755 on 8152 degrees of freedom
##   (1013 observations deleted due to missingness)
## Multiple R-squared:  0.4953, Adjusted R-squared:  0.4951 
## F-statistic:  2667 on 3 and 8152 DF,  p-value: < 2.2e-16

Much better models resulted from the division of loans into separate periods. The log transform of DebtToIncomeRatio was included in the second model, but not in the first, since it did not improve the model. This makes sense, given that we saw that the correlation between DebtToIncomeRatio and BorrowerRate is generally higher after Q2 2009.

Both models benefited from LoanOriginalAmount as expected.

The inclusion of Stated Monthly Income did not have an effect on either model, so it was left out. This means that the observed relationship may have had more to do with generally better CreditScore or DebtToIncomeRatio for borrowers with higher income.

Multivariate Analysis

———————

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The average loan amount is closely correlated with the number of loans given out.

Credit score is correlated to Borrower Rate, although that correlation changes by quarter.

As of 2009, there is a strict cutoff at a credit score of 600.

There isn’t much variation in Borrower Rate by credit Score for those borrowers with a credit score of less than 600.

The correlation between the borrower rate and the loan amount becomes higher as the CreditGrade goes up.

The correlation between the borrower rate and the loan amount becomes higher as the Prosper Rating goes up.

There is a correlation between loan amount and borrower rate exists up until 2011.

The median borrower rate is related to IncomeRange.

The trend towards lower BorrowerRate for higher StatedMonthlyIncome is consistent accross all of the quarters.

The trend towards higher BorrowerRate for Higher DebtToIncomeRatio is consistent accross all of the quarters.

The quantile Of Available Monthly income is correlated to the median Borrower Rate.

Were there any interesting or surprising interactions between features?

CreditGrades were only used up until 2009, after which ProsperRatings were primarily used to determine BorrowerRate.

StatedMonthlyIncome Varies significantly more accross different ProsperRatings than it does accross different CrediGrades.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

I creted the following linear models, in order.

(The models apply to a subset of the data taken from 2006 to 2009 with a CreditScoreRangeLower of 600 or less.)

Using CreditScoreRangeLower and LoanOriginalAmount for all loans

R^2:0.2746

Using CreditScoreRangeLower and LoanOriginalAmount for loans originating before 2009

R^2: 0.5189

Using CreditScoreRangeLower, LoanOriginalAmount and log10(DebtToIncomeRatio+1) for loans originating from Q3 2009 through Q2 2011

R^2: 0.4953

The first model is the most general one, but has a very low R^2 value, and therefore does not represent the data very well.

The Second two models, with higher R^2 values, do a much better job of modelling the data. Niether model however is capable of taking into account the fluctuations in the BorrowerRate over time due to economic trends.

Final Plots and Summary

—————————-

Plot One, BorrowerRate v. Occupation

Description One

This plot shows borrower occupation on the y axis in order of median borrower rate, with highest median borrower rate at the top. BorrowerRate is shown on the x axis.

In General, Higher Paying Occupations with a higher level of education (i.e. Judge, Computer Programmer, Engineer,Professor) have a lower median borrower rate than do lower paying occupations with lower levels of education (i.e. Teacher’s Aide, Nurse’s Aide, College Freshman,Clerical,Bus Driver).

While it is not clear weather the borrower’s occupation contributes directly to the borrower rate, it is interesting to see how the general prominance of the Borrower’s occupation relates to the BorrowerRate.

Plot Two

Description Two

This plot shows the median BorrowerRate, by quarter.

We can see from this plot that the mean borrower rate rises in 2010 and falls in 2013. More Generally, this plot demonstrates the variable nature of the BorrowerRate.

Given the relatively large number of loans, one might otherwise expect the median borrower rate to remain relatively stagnent relative to the rest of the loans over time. This variation means that the borrower rate is subject to economic trends over time in addition to statistics of the borrower and the original loan terms.

Plot Three

## [1] "R^2:"
## [1] 0.5075882

## [1] "R^2:"
## [1] 0.4953332

Description Three

The above plots show two linear models of the Borrower Rate, the first for loans up to Q2 2009, and the second for loans from Q3 2009 through Q1 2011.

This shows that variation in the BorrowerRate from Q1 2006 to Q2 2009 is explained primarily by the LoanOriginalAmount and the Credit Score for the borrowers with a credit score of greater than 0. Additionally, variation in the BorrowerRate from Q3 2009 through Q1 2011 is explained primarily by the LoanOriginalAmount, the Credit Score and the DebtToIncomeRatio.

###### The fact that DebtToIncomeRatio played a role in linear model for the second period, but not in the first is interesting. According to the National Bureau of Economic Research, the recession ended in the second quarter of 2009, which is also the end of the first model and start of the second. This may imply that the ability of the borrower to pay the loan was taken more seriously after the recession.

Reflection

The prosper loan data set contains information on more than 100,000 loans from 2006 to 2014. My objective was to explore trends and factors contributing to the borrower rate. I started out by gaining an understanding of each of the variables in the investigation, and then went on to investigate the relationship between each variable and the borrower rate. I eventually explored the strongest relationships in further detail in order to get a better sense of when and where these relationships were strongest.Finally, I created a linear model using LoanOriginalAmount and CreditScoreRangeLower. This linear was insufficient, so I created separate linear models for loans before Q2 2009 and from Q3 2009 through Q1 2011.

The biggest struggle was working out how the different variables related to the borrowerRate over time. For example, DebtToIncomeRatio did not appear to be correlated at all to the BorrowerRate, until the correlation was taken by quarter. Another example is that StatedMonthlyIncome Seemed to have a relationship to BorrowerRate, but was found to be insignificant after it was finally applied in the linear model.

The model is limited primarily by the small number of parameters used in the investigation. The data set contains some 42 parameters, of which several (in addition to those used) may have some influence on the BorrowerRate including BorrowerRateCategory, Reccomendations, Investment from friends, and Investors. The model is also limited by various fluctuations in time in the median borrowerRate over time. Therefore, further investigation could use a wider array of variables in order to try to get a better model of the borrower rate. Further investigation may also take into account contextual economic data in order to account for the general fluctuations in BorrowerRate over time. For example, I could separate the data by borrowerRateCategory, and try to explain the variation within each category, or I might get data on the national interest rate, and use that as a parameter or multiplyer in the models.